通过滚动式摄像机获得的视频导致空间延伸的帧。在快速相机/场景动作下,这些扭曲变得很重要。 RS的撤消效果有时被称为空间问题,需要对象进行整流/流离失所,以生成其正确的全局快门(GS)帧。但是,RS效应的原因是固有的,而不是空间。在本文中,我们为RS问题提出了一个时空解决方案。我们观察到,尽管它们的XY帧,RS视频及其相应的GS视频之间存在严重差异,但往往共享完全相同的XT片 - 直到已知的子帧时间变化。此外,尽管每个视频中都有强烈的时间别名,但它们共享相同的小型2D XT-Patches的分布。这允许使用RS输入视频施加的视频特定约束来限制GS输出视频。我们的算法由3个主要组成部分组成:(i)使用现成方法(通过常规视频序列训练)在连续的RS帧之间进行密集的时间上采样,从中我们提取GS“建议”。 (ii)学习使用专用Mergenet正确合并此类GS的“建议”。 (iii)特定于视频的零拍优化,该优化构成了GS输出视频和RS输入视频之间XT-Patches的相似性。我们的方法在基准数据集上获得了最新的结果,尽管在小型合成RS/GS数据集上进行了培训,但在数值和视觉上都获得了最新结果。此外,它可以很好地概括到具有运动类型的新的复杂RS视频(例如,复杂的非刚性动作)之外的运动类型 - 竞争对更多数据训练的竞争方法的视频无法很好地处理。我们将这些概括功能归因于外部和内部约束的组合。
translated by 谷歌翻译
了解神经网络记住培训数据是一个有趣的问题,具有实践和理论的含义。在本文中,我们表明,在某些情况下,实际上可以从训练有素的神经网络分类器的参数中重建训练数据的很大一部分。我们提出了一种新颖的重建方案,该方案源于有关基于梯度方法的训练神经网络中隐性偏见的最新理论结果。据我们所知,我们的结果是第一个表明从训练有素的神经网络分类器中重建大部分实际培训样本的结果是可以的。这对隐私有负面影响,因为它可以用作揭示敏感培训数据的攻击。我们在一些标准的计算机视觉数据集上演示了二进制MLP分类器的方法。
translated by 谷歌翻译
从fMRI大脑记录中重建自然视频非常具有挑战性,这两个主要原因是:(i)由于fMRI数据获取很困难,我们只有有限的监督样本,这还不足以覆盖自然视频的巨大空间; (ii)fMRI记录的时间分辨率远低于自然视频的帧速率。在本文中,我们提出了一种自我监督的自然电影重建方法。通过对编码编码自然视频的编码使用周期矛盾,我们可以:(i)利用培训视频的完整帧速率,而不仅仅限于与fMRI录音相对应的剪辑; (ii)利用受试者在fMRI机器内从未见过的大量外部自然视频。这些使适用的培训数据通过几个数量级增加,将自然视频先验引入解码网络以及时间连贯性。我们的方法大大优于竞争方法,因为这些方法仅在有限的监督数据上训练。我们进一步介绍了自然视频的新的简单暂时性先验,当将其进一步折叠到我们的fMRI解码器中时 - 允许我们在原始fMRI样本率的X8的较高框架速率(HFR)中重建视频。
translated by 谷歌翻译
尽管对视觉识别任务进行了显着进展,但是当培训数据稀缺或高度不平衡时,深神经网络仍然易于普遍,使他们非常容易受到现实世界的例子。在本文中,我们提出了一种令人惊讶的简单且高效的方法来缓解此限制:使用纯噪声图像作为额外的训练数据。与常见使用添加剂噪声或对抗数据的噪声不同,我们通过直接训练纯无随机噪声图像提出了完全不同的视角。我们提出了一种新的分发感知路由批量归一化层(DAR-BN),除了同一网络内的自然图像之外,还可以在纯噪声图像上训练。这鼓励泛化和抑制过度装备。我们所提出的方法显着提高了不平衡的分类性能,从而获得了最先进的导致大量的长尾图像分类数据集(Cifar-10-LT,CiFar-100-LT,想象齿 - LT,和celeba-5)。此外,我们的方法非常简单且易于使用作为一般的新增强工具(在现有增强的顶部),并且可以在任何训练方案中结合。它不需要任何专门的数据生成或培训程序,从而保持培训快速高效
translated by 谷歌翻译
GAN能够进行一代视频培训的生成和操纵任务。然而,这些单一视频GAN需要不合理的时间来训练单个视频,使它们几乎不切实际。在本文中,我们提出了从单个视频发电的GaN的必要性,并为各种生成和操纵任务引入非参数基准。我们恢复古典时空补丁 - 最近的邻居接近并使其适应可扩展的无条件生成模型,而无需任何学习。这种简单的基线令人惊讶地优于视觉质量和现实主义(通过定量和定性评估确认)的单视频导航,并且不成比例地更快(运行时从几天减少到秒)。除了不同的视频生成之外,我们使用相同的框架展示了其他应用程序,包括视频类比和时空复回靶向。我们所提出的方法很容易缩放到全高清视频。这些观察结果表明,古典方法(如果正确调整),这些任务的大幅优于重度深度学习机械。这为单视频生成和操作任务设置了新的基线,并且不太重要 - 首次从单个视频中从单个视频中产生多样化。
translated by 谷歌翻译
图像分类模型可以取决于图像的多个不同语义属性。对分类器的决定的说明需要对这些属性进行发现和可视化这些属性。在这里,我们通过训练生成模型来具体解释基于分类器决策的多个属性来实现这一点的样式x。此类属性的自然来源是样式语的风格,已知在图像中生成语义有意义的维度。但是,由于标准GaN训练不依赖于分类器,所以它可能不代表对分类器决定很重要的这些属性,并且风格的尺寸可以表示无关属性。为了克服这一点,我们提出了一种培训程序,该培训程序包括分类器模型,以便学习特定于分类器的风格。然后从该空间中选择解释性属性。这些可用于可视化每个图像改变多个属性的效果,从而提供特定于图像的解释。我们将风格x应用于多个域,包括动物,叶子,面和视网膜图像。为此,我们展示了如何以不同方式修改图像以改变其分类器输出。我们的结果表明,该方法发现与语义上保持良好的属性,生成有意义的图像特定的解释,并且是在用户研究中测量的人为解释。
translated by 谷歌翻译
We wish to automatically predict the "speediness" of moving objects in videos-whether they move faster, at, or slower than their "natural" speed. The core component in our approach is SpeedNet-a novel deep network trained to detect if a video is playing at normal rate, or if it is sped up. SpeedNet is trained on a large corpus of natural videos in a self-supervised manner, without requiring any manual annotations. We show how this single, binary classification network can be used to detect arbitrary rates of speediness of objects. We demonstrate prediction results by Speed-Net on a wide range of videos containing complex natural motions, and examine the visual cues it utilizes for making those predictions. Importantly, we show that through predicting the speed of videos, the model learns a powerful and meaningful space-time representation that goes beyond simple motion cues. We demonstrate how those learned features can boost the performance of self-supervised action recognition, and can be used for video retrieval. Furthermore, we also apply SpeedNet for generating time-varying, adaptive video speedups, which can allow viewers to watch videos faster, but with less of the jittery, unnatural motions typical to videos that are sped up uniformly.
translated by 谷歌翻译
Deep neural networks (DNN) have outstanding performance in various applications. Despite numerous efforts of the research community, out-of-distribution (OOD) samples remain significant limitation of DNN classifiers. The ability to identify previously unseen inputs as novel is crucial in safety-critical applications such as self-driving cars, unmanned aerial vehicles and robots. Existing approaches to detect OOD samples treat a DNN as a black box and assess the confidence score of the output predictions. Unfortunately, this method frequently fails, because DNN are not trained to reduce their confidence for OOD inputs. In this work, we introduce a novel method for OOD detection. Our method is motivated by theoretical analysis of neuron activation patterns (NAP) in ReLU based architectures. The proposed method does not introduce high computational workload due to the binary representation of the activation patterns extracted from convolutional layers. The extensive empirical evaluation proves its high performance on various DNN architectures and seven image datasets. ion.
translated by 谷歌翻译
Imperfect information games (IIG) are games in which each player only partially observes the current game state. We study how to learn $\epsilon$-optimal strategies in a zero-sum IIG through self-play with trajectory feedback. We give a problem-independent lower bound $\mathcal{O}(H(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ on the required number of realizations to learn these strategies with high probability, where $H$ is the length of the game, $A_{\mathcal{X}}$ and $B_{\mathcal{Y}}$ are the total number of actions for the two players. We also propose two Follow the Regularize leader (FTRL) algorithms for this setting: Balanced-FTRL which matches this lower bound, but requires the knowledge of the information set structure beforehand to define the regularization; and Adaptive-FTRL which needs $\mathcal{O}(H^2(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ plays without this requirement by progressively adapting the regularization to the observations.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译